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1. Overview 


1. OVERVIEW 


1.1 Introduction 


This document is Shenzhen Ziguang Tongchuang Electronics Co., Ltd. Logos2 series PCle DMA 
application documentation. It mainly introduces the design architecture, interface definition, interface 
timing, supporting devices and reference design of PCle DMA. 


1.2 Main features 
PCle DMA converts the AXI4-Stream interface into RAM read and write interface, convenient for 
customers to use. The main functions are as follows: 
» Support Gen1x1, Gen1x2, Gen1x4, Gen2x1, Gen2x2, Gen2x4; 
» Support DMA Mrd; 
> Support DMA Mwr; 
» Support 1DW PIO; 
» Support read and write data length of 474096 bytes (DW unit). 


1.3 Design information 


Table 1 Design Information 
PCIE DMA APPLICATION GUIDE 


Support device Logos 2 Series FPGA Products 


Support user interface RAM interface 


DESIGN FILES PROVIDED 
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PCle-DMA Design Files 


encrypted file 


PCle-DMA Reference 
Design 


encrypted file 


PCle-DMA simulation file 


encrypted file 


Constraints file 


fdc file 


EVELOPMENT TOOL SUPPORT 


Design Tools 


PDS Development Kit 
Support Pango design Suite 2021.1-SP7.3 version, 


Hardware platform 


This program is implemented on the 
PO41100KFO1 A2 development board 


Synthesizer tool 


ADS 


Simulation tool 


Third-party simulation software 
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1.4 Resource usage 


Table 2 Resource Utilization Rate 


PG2L100H 4 74 2563 | 1796 0 


2. FUNCTIONAL DESCRIPTION 


The PCle DMA application scenario is as follows: 


Figure 1 PCle DMA application scenario 


Function: PCle DMA converts the AXI4-Stream interface into RAM read and write interface, 
convenient for customers to use. 


Main functions: PCle ( AXI4-Stream ) DMA can realize memory transfer service function: please 
refer to chapter 1.2 for specific functions. 


https: //innek.ru/ 7/22 Logos2 series PCle DMA Application Guide 


F K 
Je VA 344 2. Functional description 


2.1 PCIe DMA design architecture 


FPGA 


PCIe DMA 


rt01_rx_engine 


AXIS Master 
rt0101 tlp rev 


1: read and write command TLP;; DMA Mrd 
2: The Mrd completion message (CpID) 
returned by the CPU wr en 

wr byte en[1:0] 
rt0102 cpld_wr_ctrl wr addr[31:0] 

wr data[15:0] 


> 
Le 
ki 
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After reading and writing 
Fed send the TLP paci 
he status bit register DMA Mwr 


S Slave0 rt02 tx engine 


rd en 
_! rd_addr[31:0 
S Slavel rt0201 mwr ctrl rd data[15:0] di 
PCle Tx | 


AXIS Slave2 


1t0202 mrd ctrl 


rd TLP(1DW) 
Mwr TLP 


rt0203 state reg 


Figure 2 PCle DMA functional block diagram 


PCle The main functional modules of DMA are described as follows: 


PCle DMA implements the memory transfer service function. PCle DMA mainly includes 
rt01 rx engine,  rtO101 tlp rcv, rtO102 cpld mwr ctrl,  rtO2 tx engine,  rt0201 mwr ctrl, 
rt0202 mrd ctrl, rtO203 state reg modules. 


> rtO1 rx engine : It is the top-level module of the receiving engine; 


> rt0101 tip rcv: It is mainly used to analyze the CplD TLP and read and write command TLP 
packets sent from the CPU side; 


rt0102 cpld wr ctrl : Mainly reorganize and control the CplD data and write it into RAM; 
rt02 tx engine : Is the top-level module of the sending engine; 


rt0201 mwer ctrl: It is mainly a memory write operation. After receiving a memory write request, 
the control data is read from RAM, and after receiving the data, assemble and send TLP through 
the axis slave2 channel; 


> rtO202 mrd ctrl: It is mainly a memory read operation, receives a memory read request, controls 
and reorganizes the data and sends TLP through the axis slave1 channel; 


> rtO203 state reg : Real-time response to read and write data is completed, axis slaveO channel 
sends TLP. 
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2.2 Interface list 


Table 3 interface list 


GLOBAL SIGNAL 

button rst n 1 | Reset signal, low effective, button reset, only for 
debugging 

perst n | 1 | Reset signal, active low, from slot PERST# 

ref cik n l 1 | HSST differential reference clock signal, 100MHz 

ref cik p l 1 

rxn l 4 | HSST differential receive signal 

rxp | 4 

txn O 4 | HSST differential transmit signal 

txp O 4 

ref led O 1 | ref clk Lighting signal, the light flashes to indicate that 
the clock is normal 

pclk led O 1 | pcik Lighting signal, the light flashes to indicate that the 
clock is normal 

pcik div2 led O 1 | pclk_div2 Lighting signal, the light flashes to indicate that 
the clock is normal 

smih link up O 1 | PHY link up indication signal, pclk div2 clock domain 

rdih link up O 1 | DLLlink up indication signal, pclk div2 clock domain 

AXI4-STREAM MASTER INTERFACE 

axis master tvalid | 1 | Data valid indication, active high 

axis_master_tready O 1 | Ready to receive data indication, active high 

axis_master_tdata | 128 | send data bus 

axis_master_tkeep | 4 | DWenable signal, a bit value of 1 indicates that the DW 
data is valid. 
Bit[O]=1 : axis master tdata[31:0] is valid; Bit[1]=1 : 
axis_master_tdata[63:32] is valid; Bit[2]=1 : 
axis master tdata[95:64] is valid; Bit[3]= 1: 
axis master tdata[127:96] is valid; 

axis master tlast | 1 | The last valid data indication, active high 
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axis master tuser 8 | bit[2:0]: Reserved 

bit[3 ]: Indicates the last CplD packet, 1'b1 is valid 
bit[6:4]: bar address 

3'b000 : bart 

3'b001 : bar1 

3'b010 : bar2 

3'b011 : bar3 

3'b100 : bar4 

3'b101 : bar5 


bit[7]: rom address 
AXI4-STREAM SLAVE INTERFACE 


axis slave0/1/2 tready O 1 Corresponding to the standard AXI4-Stream protocol 


tready 

axis slave0/1/2 tvalid l 1 Corresponding to standard AXI4-Stream protocol tvalid 

axis slave0/1/2 tdata l 128 | Corresponding to the standard AXI4-Stream protocol 
tdata 

axis_slave0/1/2_tlast 1 Corresponding to the standard AXI4-Stream protocol 
tlast 

axis slave0/1/2 tuser 1 Whether the packet is valid signal. When set to 1, it 


indicates that the current TLP packet is invalid 


RELATED PARAMETER VALUES 


cfg pbus num 8 Corresponding to Completer Bus in ID number 
Completer ID[15:0]= 


(cfg pbus num, cfg pbus dev num, function num[2:0]) 


Ce pbus dev num l 5 Corresponding to Completer Device in ID number 


cfg_max_payload_size l 3 The maximum payload size of memory write request 
(Mwr) and memory read completion message ( CplD) 
after auto-negotiation. 


cfg max payload size[2:0]: 
000b:128byte 

001b: 256byte 

010b: 512byte 

011b: 1024byte 

100b: Reserved 

101b: Reserved 

110b: Reserved 

111b: Reserved 
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cfg max rd reg, size 


Maximum payload of memory read reguest (Mrd) after 


auto-negotiation size. 
000b :128byte 

001b: 256byte 

010b: 512byte 

O11b: 1024byte 
100b: Reserved 
101b: Reserved 
110b: Reserved 
111b: Reserved 


USER LOGIC RELATED SIGNALS 


DMA MRD DATA CHANNEL 
o wr en O 1 | write enable signal 
o wr be O 2 | Write byte valid signal 
o wr addr O 32 | write address signal 
o wr data O 16 | write data signal 
DMA MWR DATA CHANNEL 
o rd en O 1 | read enable signal 
o rd addr O 32 | read address signal 
i rd data | 16 | read back data 


2.4 Interface timing 


2.4.1 User Logic Interface Timing 


me fF LPL PLP LPL LY LP Ls LP Le LP Ls LP Ly i 


DE GE 
ED. te bet Xbež Pei hed) 
pm. oddrsto] EED addr) addr X addr? X addr adora A 
ow datal ts o) NEED data0 X data1 X data? ` dataJ datad ENE EE 


o ar bel? 0} 


o sd em N 
o né addl(31 0] add) X addri X addr Vank?  addr4 y 
(rd dataj16:0] (daad X dalai X dala? X dalag X dalad X 


Figure 3 User Logic Interface Timing 


Note: The timing requirements for PCle DMA reading data are: after sending the read address 
o_rd_addr , the user's data must be delayed by one clock cycle. 
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2.4.2 AXI-Stream Master interface timing 
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Figure 4 AXI4-Stream Master 4DW Posted operation timing 
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Figure 5 AXI4-Stream Master 3DW Posted operation timing 
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Figure 6 AXI4-Stream Master 4DW Non-Posted Operation Timing 
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2.4.3 AXI4-Stream Slave interface timing 
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2. Functional description 


axis master. tdata[127:0] 
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Figure 7 AXI4-Stream Master 3DW Non-Posted Operation Timing 
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Figure 9 AXI4-Stream Slave 3DW Posted operation timing 
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Figure 10 AXI4-Stream Slave 4DW Non-Posted Operation Timing 
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Figure 11 AXI4-Stream Slave 3DW Non-Posted Operation Timing 
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3. Reference design 


3. REFERENCE DESIGN 


3.1 Reference design function introduction 


The functional block diagram of the reference design is shown in the figure below. 


FPGA 


PCle Rx PCle DMA 


AXIS Master 
Root | 


RX_ENGINE 


Complex 


PCle IP 


RAM interface . 
User logic 
TX ENGINE 


State reg 


Figure 12 Reference Design Functional Block Diagram 
The reference design mainly consists of three parts. PCle IP; PCle DMA; User logic. 
» PCle IP : The top layer of PCle IP, the interface is AXI4-Stream.. 


» PCle DMA : The DMA operation of PCle can convert the AXI4-Stream interface into a RAM read 
and write interface. For the function introduction of each module, please refer to Chapter 2.1. 


» User logic : user logic. 


3.2 Reference Design Interface List 


The reference design interface list is shown in Table 3 


3.3 Reference Design File Directory 


The reference design file directory is shown in the figure below. 
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PDS: Pango Design Suite 2021.1-SP7.3 


3. Reference design 


PG2L100H PCIe DMA design example directory structure diagram: 


bench 
ocs 

ip 

pnr 


pango pcie top.fdc 


source 


ebug test.fic 
pg2t pcie pio.pds 


simulation 


pango pcie top filelist.f 


sim.bat 


pango pcie top sim.do 
pango pcie top wave.do 


src 


—pango pcie dma top.v 


3.4 Reference Design Register Configuration 


3.4.1 Register configuration table 


//simulation test bench 

//design document 

//Related IP for design calls 

//Project directory 

//Engineering constraint reference file 
//Debug Core folder 

//Debug Core debugging file 

//PDS project file 

//Simulation project directory 


//design file list 


//Simulation script 

//Simulation TCL script 

//Simulation waveform script 

//The RTL file included in the design example 
//top file 


Figure 13 Reference Design File Directory 


The base address of the instruction register issued by the upper computer is the address of bar1, 
and the offset address corresponds to different instructions. As shown in the table below: 


Table 4 Instruction Register Table 


DMA MEMORY READ OPERATION 

Bar1+0x100 mrd mem addr | 32 DMA memory read operation, 
memory lower 32-bit address 

Bar1+0x104 mrd_mem_addr_h 32 DMA memory read operation, 
memory high 32 -bit address 

Bar1+0x110 mrd_ram_addr 32 DMA memory read operation, 
initial address of FPGA RAM. 
32-bit address 

Bar1+0x120 mrd_data_length 10 DMA memory read data 
packet length, unit /DW 

Bar1+0x130 mrd_finish_reg 32 DMA memory read operation, 
data is written to the register 
completed by RAM. The CPU 
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3. Reference design 


will continuously send the 
Bar1+0x130 instruction to the 
FPGA in a loop. After receiving 
this command, the FPGA 
replies with the CplD data 
packet, mrd finish reg[0]<0 
when the Mrd operation is not 
completed, and 

mrd finish reg[O]=1 when the 
Mrd operation is completed, 
and FPGA clears 


mrd finish reg[O] to zero. 


Bar1+0x140 


mrd_32_64_addr 32 Mrd configuration address 
length type register,: address 
length 32 bits: 

mrd 32 64 addr[0]=O, 

The address length is 64 bits: 
mrd 32 64 addr[0]=1. 


DMA MEMORY WRITE OPERATION 


Bar140x200 


mwr mem addr | 32 DMA memory write 
operation, memory lower 32- 


bit address 


Bar1+0x204 


mwr_mem_addr_h 32 DMA memory write 
operation, memory high 32 - 


bit address 


Bar1+0x210 


mwr_ram_addr 32 DMA memory write 
operation, FPGA 
Initial address of RAM 


Bar1+0x220 


mwr_data_length 10 DMA memory write packet 
length, unit /DW 
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Bar1+0x230 mwr_finish_reg 32 DMA memory write 


operation, data read out of 
RAM completed register. The 
CPU will continuously send 
the Bar1+0x230 instruction to 
the FPGA in a loop. After 
receiving this command, the 
FPGA replies to the CplD data 
packet, mwr_finish_reg[0]=0 
is replied before the Mwr 
operation is completed, and 
mwr_finish_reg[0]=1 is replied 
after the Mwr operation is 
completed, and the FPGA 
clears mwr finish reg[0O] to 


zero. 


Bar1+0x240 mwr_32 64 addr 32 Mwr configuration address 
length type register,: address 
length 32 bits: 
mwr 32 64 addr[0]=O, 
Address length 64 bits: 

mwr 32 64 addr[0]=1. 


3.5 Access process 


3.5.1 DMA memory read process 


The DMA memory read process is shown in the following figure: 
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Receive CPU's CplD data 
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FPGA receives Mrd operation 
data processing instruction in 
real time (register: bar1 0x130) 
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FPGA replies in real time 
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packets, clears registers, etc. 


~ 


Ko 


The CPU issues the lower 32-bit 
address for accessing the MEM 
(register: bar1 0x100) 


The CPU issues the upper 32-bit 
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| 


CPU issued to write FPGA RAM 
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The length of the data packet 
delivered by the CPU 
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Figure 14 DMA memory read flow chart 


3.5.2 DMA memory write process 


The DMA memory write process is shown in the following figure: 
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3. Reference design 
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FPGA writes TLP to the upper 
memory 
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! 
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32-bit address of the MEM 
(register: bar1 0x204) 


| 


FPGA receives Mwr operation 
data processing instruction in 
real time (register: bar1 0x230) 
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CPU to access and read the FPGA 
RAM, the bit width is 32bit 
(register: bar1 0x210) 


| 
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Figure 15 DMA memory write flow chart 


3.6 Reference Design Simulation 


The simulation block diagram of the reference design is shown in Figure 16 . 
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Figure 16 Reference Design Simulation Block Diagram 
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The reference design simulation block diagram is divided into Root complex and Endpoint . For 
the module description of Endpoint , please refer to 2.1 Reference Design Function Introduction. 


Root complex mainly consists of the following modules.. 
uart2apb : interface conversion module, convert serial port to standard APB interface. 


PCle_cfg_ctrl : It is only valid on the Root complex side. According to the configuration of the user 
through the APB interface, a configuration TLP packet conforming to the timing of the AXI4-Stream 
interface is generated to complete the configuration of the EP. 


> PCle dma pio : According to the configuration, a TLP packet conforming to the timing of the AXI4- 
Stream interface is generated to complete the DMA and PIO control functions. For specific register 
configuration, please refer to Chapter 7 of PCle_IP_UserGuide. 


> PCle IP : The top layer of PCle IP, the interface is AXI4-Stream.. 


The simulation waveform diagram is shown in Figure 17 . 
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Figure 17 Simulation Waveform 
Figure 17 shows the simulated waveform diagram for the reference design. 


The red box is the read and write operation of DMA. During the simulation process, TLP Length 
selects 1DW, 8DW, and 128DW, and provides DMA operations of 256DW, 768DW, and 1024DW. 
During the DMA operation, the register status of whether the read and write data is completed is 
answered in real time. The yellow box is the read and write operation of PIO. For the specific simulation 
process, please see the pango pcie top tb.v file. The sim.bat script file is provided in the reference 
design file directory, and the sim.bat script is run in the simulation tool, as shown in the simulation 
waveform above. 


Simulation environment: third-party simulation software. 
Run the script: sim.bat 
Simulation filelist : pango pcie top filelist.f 

3.6 Reference design board 


The upper board structure is driven by PC software as the RC terminal, and the EP terminal uses 
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the PO4I100KF01 A2 board for docking test. Before boarding, you need to confirm that the following 
three operations are correct: 


1. Make sure the driver is installed successfully. 


2. Make sure that the fdc constraints of the design project are correct. The reference design fdc 
constraints file already provides Gen1x1 mode and Gen2x4 mode. 


4, CONTACTS 


KOHTAKTbI ANA TEXHNYECKNX N KOMMEPUCCKUX BOTIPOCOB 
OOO «MH3k» 
r. CaHKT-Netep6ypr, ya. AbnoukoBa, A. 20, nutep A, od. 504 


contact@innek.ru 
+7 (812) 200-40-37 
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